Review #2412

atobiszei · 2024-04-17T07:26:27Z

No description provided.

- fix 404s due to openvino link structure change - 2023.3 -> 2024 where neccessary - spelling fixes

CVS-135106 --------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

* validate class and execute method existence, extend pyovms.Tensor constructor, fix finalize not called issue, print with flush in demos

Fixed bugs in capi benchmark app, documented and created demo showcasing benchmark app features

--------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

* Resolve python node todo's

CVS-132111

* Dump to file flags

smart building depending on the content parallel tests execution build performance optimization

By having a verbose flag, it creates ~67,000 lines of messages in the build logs just for unpacking the boost tar file. This makes it challenging to audit the build process.

CVS-135755

CVS-135936

CVS-132446

…change (#2393) CVS-136795

* Allow flag injection to pugixml This commit contains a patch that adds the variables for the CXX and linker flags to the CMakeLists.txt file. It then uses the patch during build so that later we can inject build flags on the cmake command. * exclude header check * fix dockerfile sequence * set ubi as the default base image --------- Co-authored-by: Steve Grubb <ausearch.1@gmail.com>

* Add string output demo

…MediaPipe stream info (#2395)

* Add support of _contents fields in KServe request input for mediapipe for all deserialization paths --------- Co-authored-by: atobisze <adrian.tobiszewski@intel.com>

* Fixing references * Fix internal link

* universal_and_benchmark_documentation_updates * no proxy update * update benchmark proxy * add version to ubuntu tag * revert ubuntu changes * added localhost * review

* dockerfile for gradio * monitoring changes in the documents scope * preinstall nltk modules * default security context set to ovms account * improvements in rag demo

CVS-138032 Implementation of /v3/chat/completions endpoint and forwarding the HTTP message to MediaPipe graph. The data is std::string now, to be adjusted in following tasks (CVS-139240/CVS-140684).

* CVS-137992_fix_deadline_exceeded_dg2 * add retry for get_model_metadata_request * add get_model_metadata function * fix test names * increase timeout for GetModelStatus

https://jira.devtools.intel.com/browse/CVS-139240 Implementation of chat completion request conversion to HttpPayload struct.

* Fix ovms status to http status conversion

* add-version-to-ubuntu-os * fix ovms_pkg link * BASE_OS_DISTRO * ovms_pkg os * updates * DIST_OS added * adjust nginx build * fix nginx * Update Makefile Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com> * Update Makefile Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com> --------- Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>

CVS-139231/CVS-139233 This introduces LLM calculator that accepts HTTP OpenAI /v3/chat/completion requests and produces compliant responses. Working in both - unary and streaming modes. Bunch of parameters are still marked as TODO, but should be enough to perform benchmarks. Minimal demo description how to run.

* Add scheduler config in graph options * Fix centos stream-8

--------- Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com> Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>

CVS-142768 Forwards beam search and multinomial sampling parameters to CB library - this enables returning more than 1 completion for beam search (only for unary) Adds profiling traces (minitrace)

* Add UTs for llm request conversion

* fix tbb handling for ubuntu20

…2464)

There is an issue (or feature?) that adding generated token to the token cache produces shorter message than previous without newly generated one. TextStreamer did not expect such behavior. The fix ignores such event and makes the generation wait for the next tokens. + reducing number of response chunks by adding requirement so that chunk needs to include space in order to send cache to the client

dkalinowski and others added 30 commits March 6, 2024 11:55

Update links [main] (#2358)

c5a2a67

- fix 404s due to openvino link structure change - 2023.3 -> 2024 where neccessary - spelling fixes

Snippet with latest package (#2360)

630f3b2

Clone the repo before running demo in instructions (#2362)

8da2890

LLM Text Gen demo - fixes & adjusting to notebook changes (#2367)

818b4fa

CVS-135106 --------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

Updated to v2024.0 public ovms package in C-API Minimal App Demo (#2357)

f1e403d

Python improvements and bugfixes (#2366)

6c4a09a

* validate class and execute method existence, extend pyovms.Tensor constructor, fix finalize not called issue, print with flush in demos

Documentation for capi benchmark app (#2329)

33d8d9f

Fixed bugs in capi benchmark app, documented and created demo showcasing benchmark app features

RAG Chatbot demo (#2328)

60234af

--------- Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

String request validation when data array is empty (#2368)

95336a8

Update OV to 11 march (#2369)

c21f7bb

Added test for string precision metadata (#2343)

59690c1

Resolve python node todo's (#2344)

ef32570

* Resolve python node todo's

ov::string serialization for single models (+ single model DAG) (#2372)

e9b17ea

CVS-132111

Update to new calculators validation (#2335)

34a7551

* Dump to file flags

[LLM Demos] Add optional seed, adjust readme (#2376)

2af021b

PR testing changes (#2323)

a4aa7f3

smart building depending on the content parallel tests execution build performance optimization

Quieten boost (#2379)

ca5934e

By having a verbose flag, it creates ~67,000 lines of messages in the build logs just for unpacking the boost tar file. This makes it challenging to audit the build process.

Integrate openvino_tokenizers repo (#2381)

f76dedc

CVS-135755

Fix input name in demo after model input name changed (#2382)

602500d

CVS-135936

Bump tokenizers to commit with MUSE bugfix (#2389)

296f53a

CVS-132446

Fix input name in Go Client Image Classification demo after OpenVINO …

42de1f3

…change (#2393) CVS-136795

CVS-134244_skip_test_reshaping_gpu (#2396)

2cee117

ignore .venvs for docker context (#2399)

8286d31

update ov to build 14985 (#2394)

e5dd40c

Add string output demo (#2378)

a959b8e

* Add string output demo

Fix row named format for binary inputs in REST API & fix logging for …

10667af

…MediaPipe stream info (#2395)

Upgraded gradio (#2392)

067d241

Full support of KServe Api for Mediapipe

6058b37

* Add support of _contents fields in KServe request input for mediapipe for all deserialization paths --------- Co-authored-by: atobisze <adrian.tobiszewski@intel.com>

[DOCS] Fixing references for docs (#2402)

ba6e878

* Fixing references * Fix internal link

rasapala and others added 29 commits May 6, 2024 14:47

Init PR checkbox (#2375)

6a7b271

universal_and_benchmark_documentation_updates (#2433)

ff46b1d

* universal_and_benchmark_documentation_updates * no proxy update * update benchmark proxy * add version to ubuntu tag * revert ubuntu changes * added localhost * review

RAG python demo with online scope changes (#2434)

0b95dd5

* dockerfile for gradio * monitoring changes in the documents scope * preinstall nltk modules * default security context set to ovms account * improvements in rag demo

Implement /v3/chat/completions endpoint - forward to MP graph (#2436)

37d69bf

CVS-138032 Implementation of /v3/chat/completions endpoint and forwarding the HTTP message to MediaPipe graph. The data is std::string now, to be adjusted in following tasks (CVS-139240/CVS-140684).

CVS-137992_fix_tests_deadline_exceeded_dg2 (#2440)

3d34b77

* CVS-137992_fix_deadline_exceeded_dg2 * add retry for get_model_metadata_request * add get_model_metadata function * fix test names * increase timeout for GetModelStatus

Cvs 139229 llm node resource (#2437)

7947c57

Add LLM execution thread 1/2 (#2442)

19872ee

update ov version to 15441 build (#2445)

787dddc

Mkulakow/chat completion request conversion (#2443)

b9ae161

https://jira.devtools.intel.com/browse/CVS-139240 Implementation of chat completion request conversion to HttpPayload struct.

client tests for python3.12 (#2446)

916bf33

update_test_requirements (#2453)

57735ef

LLM execution thread notifications and error handling 2/2 (#2447)

39a0aba

Fix ovms status to http status conversion (#2410)

48f6e55

* Fix ovms status to http status conversion

Update mediapipe docs and demo (#2422) (#2458)

59c5886

fix regstering tokenizer extension from the ov lib path (#2466)

f08caa1

update ov (#2465)

f3be238

Python 3.12 support. (#2444)

746ed14

Adding LLM scheduler config from graph options. (#2455)

b991f9b

* Add scheduler config in graph options * Fix centos stream-8

added documentation about openai api (#2460)

1637183

--------- Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com> Co-authored-by: ngrozae <104074686+ngrozae@users.noreply.github.com>

Add handling for missing OpenAI parameters + profiling traces (#2468)

0c23278

CVS-142768 Forwards beam search and multinomial sampling parameters to CB library - this enables returning more than 1 completion for beam search (only for unary) Adds profiling traces (minitrace)

update client dependencies (#2476)

122a163

Add http payload conversion UTs (#2461)

019e35d

* Add UTs for llm request conversion

update ov to 2024.2 rc2 (#2481)

762ea3f

* fix tbb handling for ubuntu20

update_tests_requirements (#2482)

90dc2dd

Fix http MEDIAPIPE_DEFINITION_NOT_LOADED_ANYMORE status conversion (#…

5c73d1b

…2464)

Support Jinja template application for /chat/completion endpoint (#2479)

bdd225f

atobiszei closed this Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review #2412

Review #2412

atobiszei commented Apr 17, 2024

Review #2412

Review #2412

Conversation

atobiszei commented Apr 17, 2024